2023-12-12 09:36:31.AIbase.4.1k
Researchers Successfully Induce AI Chatbot to Reveal Harmful Content
Researchers at Purdue University in Indiana designed a new method to successfully induce large language models to generate harmful content. They discovered potential dangers hidden within compliance responses. By using probabilistic data and soft labels to prompt the model, they achieved a success rate of up to 98%. The researchers caution the AI community to be prudent in open-sourcing language models and suggest that removing harmful content is a better solution.